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An RNA polymerase ribozyme that was obtained by directed evolution can propagate a 
functional RNA through repeated rounds of replication and selection, thereby enabling 
Darwinian evolution. Earlier versions of the polymerase did not have sufficient copying 
fidelity to propagate functional information, but a new variant with improved fidelity 
can replicate the hammerhead ribozyme through reciprocal synthesis of both the ham- 
merhead and its complement, with the products then being selected for RNA-cleavage 
activity. Two evolutionary lineages were carried out in parallel, using either the prior 
low-fidelity or the newer high-fidelity polymerase. The former lineage quickly lost ham- 
merhead functionality as the population diverged toward random sequences, whereas the 
latter evolved new hammerhead variants with improved fitness compared to the starting 
RNA. The increase in fitness was attributable to specific mutations that improved the 
replicability of the hammerhead, counterbalanced by a small decrease in hammerhead 
activity. Deep sequencing analysis was used to follow the course of evolution, revealing 
the emergence of a succession of variants that progressively diverged from the starting 
hammerhead as fitness increased. This study demonstrates the critical importance of 
replication fidelity for maintaining heritable information in an RNA-based evolving 
system, such as is thought to have existed during the early history of life on Earth. 
Attempts to recreate RNA-based life in the laboratory must achieve further improve- 
ments in replication fidelity to enable the fully autonomous Darwinian evolution of 
RNA enzymes as complex as the polymerase itself. 


directed evolution | ribozyme | RNApolymerase | RNA replication 


Darwinian evolution depends on the selective propagation of heritable information. In 
biology that information is represented by the sequence of nucleotide subunits within an 
RNA or DNA genome and is expressed through the set of RNAs and proteins encoded 
by that genome. During the early history of life on Earth, it is thought that RNA served 
as both the genetic material and the agent of expressed function in an era commonly 
referred to as the “RNA world” (1-3). At the outset, RNA replication is likely to have 
occurred as a non-enzymatic process by which RNA templates were copied to yield com- 
plementary strands, which in turn were copied to yield additional copies of the starting 
templates (4-7). Sequence variation would have arisen due to imperfect copying fidelity, 
and those variants that replicated most efficiently would have grown to dominate the 
population, until new variants with even greater fitness arose. 

At some point during the early history of RNA-based evolution, it is thought that RNA 
evolved the ability to catalyze its own replication, acting as an RNA-dependent RNA 
polymerase (1-3). As the efficiency and accuracy of RNA replication improved, larger 
and more complex RNAs could be replicated, encompassing more sophisticated catalytic 
motifs and expanding the functional repertoire of the RNA world. Throughout the gen- 
erations of RNA-based evolution, copying accuracy must have exceeded a critical threshold 
to maintain heritable information, and this threshold would have risen as the evolving 
RNAs increased in size and complexity (8, 9). 

The mathematical relationship between replication accuracy and maximum genome 
length was first described by Eigen (8). Stated simply, the relative advantage enjoyed by 
the most advantageous individuals in the population must exceed the probability of 
producing an error copy of those advantageous individuals. The greater the number of 
conserved nucleotide subunits, the higher the copying fidelity must be to produce error-free 
copies and thus to ensure that genomic information can be maintained over successive 
generations. 

Although there are no known examples in biology of RNA enzymes with RNA-dependent 
RNA polymerase activity, considerable progress has been made in developing such enzymes 
in the laboratory using directed evolution methods (10-12). The most advanced of these 
polymerases, which are derived from the class I RNA ligase (13), can synthesize RNAs 
containing more than 100 nucleotide subunits (14-17). Although this improved poly- 
merase activity enables the synthesis of a variety of functional RNAs, including aptamers 


PNAS 2024 Vol. 121 


No.11  @2321592121 


https://doi.org/10.1073/pnas.2321592121 


Significance 


An RNA enzyme with RNA 
polymerase activity was used to 
replicate and evolve an RNA 
enzyme with RNA-cleavage 
activity. The fidelity of the 
polymerase is sufficient to 
maintain heritable information 
over the course of evolution, with 
a succession of variants of the 
RNA-cleaving RNA enzyme 
arising that have progressively 
increasing fitness. The RNA- 
catalyzed evolution of functional 
RNAs is thought to have been 
central to the early history of life 
on Earth and to the possibility of 
constructing RNA-based life in 
the laboratory. 


Author affiliations: “The Salk Institute, La Jolla, CA 92037 


Author contributions: N.P., D.P.H., and G.FJ. designed 
research; N.P. and D.P.H. performed research; N.P., 
D.P.H., and G.FJ. analyzed data; and D.P.H. and G.FJ. 
wrote the paper. 


The authors declare no competing interest. 
This article is a PNAS Direct Submission. 


Copyright © 2024 the Author(s). Published by PNAS. 
This article is distributed under Creative Commons 
Attribution-NonCommercial-NoDerivatives License 4.0 
(CC BY-NC-ND). 


'To whom correspondence may be addressed. Email: 
dhorning@salk.edu or gjoyce@salk.edu. 


This article contains supporting information online at 
https://www.pnas.org/lookup/suppl/doi:10.1073/pnas. 
2321592121/-/DCSupplemental. 


Published March 4, 2024. 


1 of 11 


2 of 11 


and ribozymes, the fidelity of synthesis, especially for more com- 
plex RNAs, has remained low (14, 15, 17, 18). However, now 
that the polymerase can synthesize larger functional RNAs, 
directed evolution can select for polymerases with higher fidelity 
by requiring them to synthesize longer products without intro- 
ducing deleterious mutations that would disrupt the function of 
those products. 

During the most recent rounds of directed evolution, which 
are reported here, the polymerases were required to synthesize the 
class I ligase. This requirement placed unprecedented selection 
pressure on the accuracy of synthesis because even a few mistakes 
would result in an inactive product. As a consequence of these 
efforts, the fidelity of synthesis improved substantially, making it 
possible, for the first time, to use RNA enzymes to replicate and 
evolve other functional RNAs. As a demonstration of this capa- 
bility, both the prior low-fidelity and the newer high-fidelity pol- 
ymerase ribozymes were challenged to evolve the hammerhead 
ribozyme. 

The hammerhead is a small, self-cleaving RNA that contains 
~34 nucleotides, 12 of which are strictly conserved (19, 20). This 
size was found to place the hammerhead above the error threshold 
for the low-fidelity polymerase, but within the error threshold for 
the high-fidelity polymerase. In the present study, eight rounds 
of RNA-catalyzed evolution were carried out in parallel using the 
two polymerases. During each round, the hammerhead RNA was 
copied by the polymerase to yield its complement, which in turn 
was copied by the polymerase to generate new hammerhead ribo- 
zymes, which were required to catalyze an RNA-cleavage reaction. 
Thus the fitness of the evolving hammerheads was determined by 
both their replicability and catalytic activity. Following each round 
of evolution, the RNAs were reverse transcribed and PCR ampli- 
fied, which provided an archive of materials that were analyzed 
by deep sequencing to determine the course of hammerhead 
evolution. 

The low-fidelity polymerase was unable to maintain the essen- 
tial features of the hammerhead motif, with heritable information 
becoming lost to entropic decay after only a few generations of 
evolution. In contrast, the high-fidelity polymerase not only sus- 
tained the evolving population of functional hammerheads, but 
also led to the emergence of novel variants with improved fitness 
compared to the starting molecule, progressively accruing advan- 
tageous mutations. These improvements in fitness reflect a com- 
promise between replicability and catalytic activity, with the 
evolved hammerheads having somewhat reduced catalytic activity, 
but a greater increase in their ability to be copied by the 
polymerase. 

‘These results are relevant to the RNA world hypothesis, whereby 
the higher the fidelity of the polymerase, the larger the functional 
RNA it is able to synthesize; and the larger the functional RNA 
it is able to synthesize, the more opportunity there is to evolve 
complex motifs with even greater fidelity (3). This process may 
also be the path to the laboratory evolution of RNA polymerases 
with sufficient activity and fidelity to catalyze the evolution of 
RNAs as large and complex as the polymerase itself. 


Results 


Evolution of Polymerases with Improved Activity and Fidelity. 
A directed evolution procedure was developed that requires 
the polymerase ribozyme to synthesize a functional copy of its 
evolutionary ancestor, the class I RNA ligase (13). The class I 
ligase is a challenging target for the polymerase, which the previous 
most advanced variant, the 52-2 polymerase, can synthesize in 
only 2% yield after 24 h, with a fidelity of 84.1% per nucleotide 
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(17). The class I ligase is a complex motif, containing 33 highly 
conserved and 41 partially conserved nucleotides (21). As a 
consequence, the ensemble of ligase ribozymes synthesized by 
the 52-2 polymerase are >1,000-fold less active compared to the 
ensemble synthesized by a protein polymerase (17). Therefore, 
by requiring the polymerase ribozyme to synthesize a functional 
ligase, one can impose stringent selection pressure to direct the 
evolution of polymerases with improved fidelity (Fig. LA). 

The population of RNA polymerases obtained after the prior 
52 rounds of directed evolution (17, 18, 22), from which the 52-2 
polymerase was isolated, were covalently linked to an RNA primer 
(Primerl) and challenged to extend that primer on a separate, 
3’-biotinylated RNA template (Tem2) using NTP substrates to 
yield the class I ligase (for sequences of oligonucleotides, see 
SI Appendix, Table S1). Following removal of the template RNA, 
the polymerase-linked extension products were captured on mag- 
netic beads by hybridization to a bead-bound DNA primer (Rev2) 
that is complementary to the 3’ end of the polymerase. The DNA 
primer was extended by reverse transcriptase to inactivate the pol- 
ymerase as an RNA-cDNA hybrid, thus preventing it from par- 
ticipating in the subsequent ligation reaction. One substrate for 
the ligase (S1) was attached to the cDNA and the other, which 
was 3’-biotinylated (S2), was provided free in solution. The liga- 
tion reaction was allowed to proceed for 30 min, after which the 
cDNA was released from the beads and any cDNAs that had been 
labeled by an active ligase were captured on streptavidin, then 
selectively copied to yield the opposing strand of DNA, which 
was PCR amplified. The primers for amplification were specific 
to the ligated products, adding further stringency to the selection 
process. The amplified DNA was then forward transcribed to yield 
a progeny population of polymerase ribozymes to begin the next 
round of directed evolution. 

The evolution procedure was continued for 18 rounds, pro- 
gressively decreasing the time allowed for polymerization from 
20 to 1 h and reducing the concentration of Mg” from 200 to 
83 mM. Mg” is a catalytic cofactor for the polymerase (23), and 
at high concentrations has been shown to reduce polymerase 
fidelity, presumably by stabilizing base mismatches (17). 
Following the final round of evolution (71 in total), the popu- 
lation was sequenced at depth across all of the rounds. A dom- 
inant sequence containing 10 mutations relative to the 52-2 
polymerase, named 71-89 (Fig. 1B), was highly enriched over 
the later rounds of evolution and was chosen for comparison to 
the 52-2 polymerase in subsequent RNA-catalyzed evolution 
studies. 


RNA-Catalyzed Evolution of Hammerhead Ribozymes. The 52-2 
polymerase and its predecessors can synthesize small functional 
ribozymes in a single-pass reaction, but are incapable of replicating 
those RNAs, which requires reciprocal synthesis of both the 
functional plus strand and the complementary minus strand. 
Polymerase fidelity plays a more critical role in RNA replication 
compared to RNA synthesis because errors can be introduced 
during the synthesis of either strand and thereby disrupt function. 
In addition, there are idiosyncratic sequence-dependent effects 
in the polymerization reaction (17), such that the efficient and 
accurate synthesis of a plus strand does not ensure similarly 
efficient and accurate synthesis of a corresponding minus strand. 

The 52-2 polymerase is capable of synthesizing the hammerhead 
ribozyme in the presence of either 200 or 50 mM Mg”, but the 
higher concentration is required for good yield and the lower 
concentration is required for good fidelity (17). However, the 
polymerase struggles to synthesize the minus strand, even in the 
presence of 200 mM Mg”. Furthermore, reciprocal synthesis of 
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Fig. 1. Directed evolution of polymerase and hammerhead ribozymes. (A) Scheme for selective amplification of polymerase ribozymes that synthesize a 
functional class | ligase ribozyme. 1) Attachment to the polymerase of an RNA primer (magenta), which binds to an RNA template (brown) that encodes the class 
I ligase; 2) extension of the primer by polymerization of NTPs (cyan); 3) reverse transcription of the polymerase; 4) attachment of one RNA substrate for ligation 
(orange) to the 3’ end of the polymerase cDNA and hybridization of that substrate to a 5’-biotinylated (green) RNA template that is linked to the other substrate 
for ligation (orange); 5) capture of the ligated products on streptavidin magnetic beads (gray); and 6) PCR amplification and transcription to generate progeny 
polymerases. (B) Sequence and secondary structure of the 71-89 RNA polymerase. Orange circles indicate mutations that arose during evolution leading to 
the 52-2 polymerase (17); red circles indicate mutations that arose from the 52-2 to the 71-89 polymerase; nucleotides in gray were added during polymerase 
evolution. (C) Sequence and secondary structure of the hammerhead bound to an RNA substrate. Stem elements I-III are labeled, strictly conserved nucleotides 
are shown in red, and the 5’ and 3’ primer-binding regions are shown in magenta and brown, respectively. The arrow indicates the substrate cleavage site. 
(D) Scheme for RNA-catalyzed selective amplification of hammerhead ribozymes that cleave an attached RNA substrate. 1) Extension of an RNA primer (brown) 
by polymerization of NTPs on an RNA template that encodes HHR- RNA; 2) strand separation and hybridization of a second RNA primer (magenta), with attached 
3’-biotinylated RNA substrate; 3) extension of the second primer to generate HHR+ RNA; 4) biotin capture on streptavidin magnetic beads; 5) cleavage of the 
attached RNA substrate, releasing the hammerhead from the beads; 6) reverse transcription, PCR amplification, and forward transcription to begin the next 
round of selective amplification. 
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plus and minus strands requires the polymerase to synthesize a 
primer binding site within each strand, flanking the hammerhead 
motif, which is beyond the capability of the 52-2 polymerase. 
Thus, to compare the ability of the 52-2 and 71-89 polymerases 
to propagate functional RNAs, it was necessary to modify the 
hammerhead ribozyme so that it could be replicated by the 52-2 
polymerase. 

For synthesis of the minus strand (HHR-), which is the com- 
plement of the hammerhead, the primer was designed to include 
the 5’-substrate binding arm and the last two nucleotides of the 
conserved “GAAA” portion of the motif. In addition, the central 
stem of the hammerhead (stem I) was modified so that it could 
be copied more readily, while still retaining hammerhead activity 
(Fig. 1C). This variant of the hammerhead, which will henceforth 
be referred to as “Seq0,” can be replicated by both the 52-2 and 
71-89 polymerases. For 52-2, the reactions require 200 mM Mg” 
and generate HHR+ and HHR- in 18% and 23% yield, respec- 
tively after 24 h. The 71-89 polymerase, which was evolved to 
operate with higher fidelity in the presence of 50 mM Mg™, yields 
11% HHR+ and 7% HHR- after 24 h. 

The two polymerases have a significantly different fidelity of 
synthesis. Beginning with the Seq0 hammerhead ribozyme as a 
template, the 52-2 polymerase synthesizes HHR- (in 200 mM 
Mg”) with a single-pass fidelity of 85.6% per nucleotide. The 
resulting HHR- RNAs in turn direct the synthesis of new copies 
of HHR+ with 81.4% overall fidelity for the reciprocal synthesis 
of HHR- and HHR+. In contrast, the 71-89 polymerase (in 50 
mM Mg” *) synthesizes HHR- with 90.9% single-pass fidelity, 
and carries out reciprocal synthesis of HHR- and HHR+ with 
89.1% overall fidelity (S7 Appendix, Tables S2 and S3). This sub- 
stantial difference in copying fidelity has important consequences 
for the preservation of functional sequence information over the 
course of successive rounds of selective amplification. 

Two parallel lineages of RNA-catalyzed evolution were carried 
out, using either the 52-2 or 71-89 polymerase in the presence of 
either 200 or 50 mM Mg”, respectively, to replicate the hammerhead 
RNA (Fig. 1D). Starting with Seq0 HHR+ RNA, the polymerases 
first were required to synthesize complementary HHR- RNAs. Then 
the HHR+ template was removed, the full-length HHR- products 
were gel purified, and the HHR- RNAs were used as templates to 
synthesize new copies of HHR+ RNAs. 

The primer for HHR+ synthesis (Primer3) was 5’-biotinylated 
and linked to the substrate RNA for the hammerhead ribozyme. 
The HHR+ products were captured on streptavidin-coated 
beads, and then incubated for 30 min in the presence of 20 
mM Mg” to allow functional hammerheads to cleave the 
attached substrate, thereby becoming released from the beads. 
The released hammerheads were reverse transcribed and PCR 
amplified to provide an archive of the selected materials. A 
portion of these materials were forward transcribed to yield a 
progeny population of ribozymes to begin the next round of 
RNA-catalyzed replication and selection based on catalytic 
function. This process was continued for eight rounds for each 
lineage, after which materials from each round were analyzed 
by deep sequencing to determine the course of hammerhead 
evolution. 


Sequence Changes Over the Course of Evolution. The two lineages 
of RNA-catalyzed evolution of the hammerhead ribozyme had 
very different outcomes, with RNA-cleavage activity becoming 
lost in the 52-2 lineage, while it was maintained in the 71-89 
lineage. The catalytic activity of the two evolving populations 
of HHR+ RNAs was measured after each round in comparison 
to that of the Seqd hammerhead, based on the ability to cleave 
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a separate RNA substrate (Fig. 2 A and B). It was not feasible to 
measure catalytic activity under the same on-bead format as used 
during evolution because the activity of the 52-2 lineage quickly 
fell to very low levels that could only be measured using purified 
materials. 

Over the course of evolution, the population replicated by the 
52-2 polymerase exhibited progressively declining activity, falling 
to a barely detectable level by round 8. In contrast, for the popu- 
lation replicated by the 71-89 polymerase, catalytic activity was 
maintained throughout the eight rounds. The sequences of indi- 
vidual members of the two populations were matched to the bio- 
chemically defined sequence requirements for active hammerhead 
ribozymes (20). The same trend was observed, with the 52-2 pop- 
ulation drifting away from sequences required to maintain cata- 
lytic activity, while the 71-89 population largely held fast to the 
functional motif (Fig. 2 A and B). 

The two lineages showed distinct patterns in their accrual of 
sequence variation. The frequency of mutations within the evolv- 
ing populations of RNAs was calculated as the Levenshtein dis- 
tance from Seq0, based on the total number of substitutions, 
insertions, and deletions (Fig. 2 C and D). For the 52-2 lineage, 
more than half of the RNAs contained at least 5 mutations by 
round 3, and by round 6 most contained more than 10 mutations. 
For the 71-89 lineage, the number of mutations increased much 
more slowly, with an average of 3 mutations by round 3 and less 
than 5 mutations by round 8. Consistent with these observations, 
the final population from the 52-2 lineage could not enrich nor 
even maintain specific RNA sequences, whereas that from the 
71-89 lineage could enrich or deplete specific sequences, consist- 
ent with selection based on their differential fitness (S7 Appendix, 
Fig. S1 A and B). 

Because the steps of reverse transcription and PCR amplification 
were carried out prior to each round of HHR- synthesis, it was 
important to determine whether enrichment or depletion of spe- 
cific sequences might have been due to these protein-catalyzed 
steps rather than the RNA-catalyzed processes. Thus, beginning 
with the round 7 population from each lineage, the HHR+ RNAs 
were directly reverse transcribed and PCR amplified before 
sequencing, without involving replication by the polymerase ribo- 
zyme or RNA-catalyzed cleavage of the substrate RNA. For both 
the 52-2 and 71-89 polymerases, there was a roughly normal dis- 
tribution of sequence changes, with a mean and SD of 0.75 + 0.24 
and 0.95 + 0.32, respectively (SI Appendix, Fig. S1 C and D). 
There was no significant enrichment or depletion of particular 
sequences; thus, demonstrating that the protein-catalyzed steps 
did not introduce significant selection bias. 

Formal measures of population diversity confirm that the 52-2 
lineage is unable to enrich selectively advantageous sequences. 
The Shannon population entropy is a measure of sequence diver- 
sity in a population, maximized at 1.0 when every RNA in the 
population has a unique sequence (24). The entropy value for the 
52-2 lineage reached 0.94 by round 3 and was 0.97 at round 8, 
corresponding to a nearly random distribution of rare and distinct 
sequences. In contrast, the 71-89 lineage plateaued at an entropy 
value of ~0.6 by round 4 and maintained that value through 
round 8 (SI Appendix, Fig. S2A). For the 52-2 lineage, the average 
sequence difference between any two RNAs (25) was ~40% at 
round 3 and ~50% at round 8, whereas for the 71-89 lineage an 
average difference of ~20% was maintained throughout rounds 
2 through 8 (SI Appendix, Fig. S2B). In summary, the population 
replicated by the 52-2 polymerase decayed to a state of highly 
divergent, low-abundance sequences, whereas the population 
replicated by the 71-89 polymerase became enriched with clusters 
of high-abundance sequences. 
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Fig. 2. RNA-catalyzed evolution of the hammerhead ribozyme. The 52-2 (orange) or 71-89 (blue) polymerase catalyzed replication of the hammerhead, with 
selection dependent on catalytic activity of the hammerhead. (A and B) Following each round, RNA-cleavage activity was determined for the population relative 
to that of SeqO RNA (dark colored circles) and the fraction of RNA molecules consistent with the hammerhead motif was determined from the sequencing data 
(light colored circles). Values of the former are based on three replicates, with SE. (C and D) Violin plots indicate the distribution of mutations in the population 


after each round, relative to Seq0. 


Emergence of Novel Hammerhead Variants. A simple clustering 
algorithm was used to identify individual sequences of high 
abundance, together with their closely related sequences, over the 
course of evolution (26). Separately, a neighbor-joining phylogeny 
was calculated for the most abundant sequences for each round of 
evolution (27). The results from these two approaches are largely 
congruent, with distinct clades or close paraphyletic groups in the 
phylogeny corresponding to distinct peaks of related sequences 
(Fig. 3A and SI Appendix, Fig. S3A). 

For the 52-2 lineage, there was a cluster of sequences after round 
1 centered about Seq0 and comprising ~50% of all RNAs in the 
population. During subsequent rounds, a succession of overlap- 
ping clusters arose and faded away, each containing one to two 
mutations relative to Seq0 (SI Appendix, Fig. S3). The fraction of 
total RNAs in these peaks fell below 20% by round 3 and below 
10% by round 8. Any sequence cluster that arose that was distinct 
from Seq0 suffered the same fate, beginning as a small fraction of 
the total RNAs and fading by round 8. 

For the 71-89 lineage, in contrast, the majority of sequences in 
the population belonged to one of multiple prominent clusters 
throughout the course of evolution. After round 1, there was a 
single large cluster centered about Seq0. During subsequent 
rounds, new sequence clusters emerged, centered about distinct 
peak sequences (Fig. 3A). The trajectories of the most abundant 
clusters were tracked over the eight rounds (Fig. 3B). The clusters 
were named after the peak sequence within each cluster. 

The Seq0 cluster declined from the outset, initially replaced by 
a group of very closely related sequences that subsequently coa- 
lesced to form the Seq2 cluster, with a peak sequence containing 
two mutations relative to Seq0. By round 6, the Seq3 and Seq5 
clusters emerged, containing five and three mutations, respectively, 
relative to Seq0. These two clusters remained prominent through 
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round 8. Beginning at round 7, the Seq15 cluster expanded rap- 
idly, increasing by more than 10-fold during the last two rounds 
of evolution. Seq15 contains six mutations relative to Seq0, 
including a single-nucleotide deletion that causes a frameshift 
within stem II. Finally, in round 8 the Seq35 cluster became nota- 
ble, with a distinct constellation of five mutations relative to Seq0. 
In retrospect, Seq35 had gradually been increasing in abundance 
over the entire eight rounds, but did not constitute a significant 
fraction of the population until round 8. All of the peak sequences 
conform to the canonical hammerhead ribozyme motif (Fig. 3C), 
with no mutations at the most highly conserved nucleotide posi- 
tions, two or more mutations within stem II, and zero or one 
mutations within stem I which pairs with the 3’ portion of the 


RNA substrate. 


Biochemical Properties of the Evolved Hammerhead Ribozymes. 
The Seq0, Seq2, Seq3, Seq5, Seq15, and Seq35 ribozymes were 
prepared by in vitro transcription of synthetic DNA and tested 
individually for their ability to be replicated by the 71-89 
polymerase. The synthetic HHR+ RNAs were copied by the 
polymerase to yield complementary HHR- RNAs, which in turn 
were copied by the polymerase to yield HHR+ RNAs. Separately, 
synthetic HHR+ RNAs were tested for catalytic activity in the 
RNA-cleavage reaction, conducted in the same format as during 
RNA-catalyzed evolution. 

All five evolved hammerhead variants are replicated more eff- 
ciently than SeqO RNA, but are less active in the RNA-cleavage 
reaction (Fig. 4A). For each of these variants, there is a two- to 
threefold improvement in the synthesis of HHR- and a two- to 
eightfold improvement in the synthesis of HHR+ RNA. RNA- 
cleavage activity is reduced by 1.5- to threefold. The net effect, mul- 
tiplying the yields from plus- and minus-strand synthesis and the 
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Fig. 3. Emergence of peak sequences and clusters over the course of evolution catalyzed by the 71-89 polymerase. (A) Phylogenetic tree of sequences that 
reached an abundance of 20.5% of the population by the third round or later. Seq0 and the most abundant peak sequences are labeled. (B) Abundance by round 
of the peak sequence (dark shading), sequences within the same cluster as the peak sequence that share a common ancestor as the peak (medium shading; 
bracketed in A), and the entire cluster surrounding the peak (light shading). These data are normalized to the highest abundance reached by the cluster, with 
the maximum percent abundance indicated by the number at the top right of the graph. (C) Sequence and secondary structure of Seq0 and the most abundant 
peak sequences, with mutations highlighted by colored circles. 


requisite RNA-cleavage activity of the plus strand, is that the evolved ‘The fitness of an evolved variant is not determined by the fitness of 
variants have an overall three- to eightfold improvement in fitness an individual reference sequence, but rather by the distribution of 
compared to Seq0 (Fig. 4B). Seq15 has the highest overall fitness, variants about that reference sequence that evolve together as a “qua- 
with a three- and sevenfold improvement in HHR- and HHR+ sispecies” (8, 9, 28). Because the 71-89 polymerase operates with 


synthesis, respectively, counterbalanced by a twofold decrease in 89.1% fidelity per round of replication, each round results in an average 
catalytic activity. of two mutations per sequence. Thus a quasispecies distribution is 
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Fig. 4. Relative fitness of the most abundant peak variants compared to Seq0. (A) Measurements were made of HHR- synthesis (light blue bars), HHR+ synthesis 
(medium blue bars), and HHR+ catalytic activity (orange bars). Individual HHR+ RNAs that had been prepared by in vitro transcription were used as templates 
for polymerase-catalyzed synthesis of HHR-, then the resulting HHR- products were used as templates for polymerase-catalyzed synthesis of HHR+. Separately, 
HHR+ RNAs with attached RNA substrate were prepared synthetically and tested for RNA-cleavage activity under the same conditions as during RNA-catalyzed 
evolution. (B) Overall fitness was estimated by the multiplicative effects of HHR- synthesis, HHR+ synthesis, and HHR¢+ catalytic activity (light blue bars); also 
adjusting for differences in specific activity due to sequence-specific effects on copying fidelity (medium blue bars); both in comparison to fitness based on 
relative enrichment during a single competitive round of evolution involving SeqO and the five peak sequences (dark blue bars). Values are based on three 
replicates, with SE. 
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inevitable, and a better measure of fitness is the net enrichment of 
progeny RNAs that derive from a parental quasispecies. 

With the current selection format, it is not feasible to track the 
direct descent of members within a complex population. Thus to 
estimate the fitness of the quasispecies centered about a reference 
sequence, the Seqd RNA and each of the five most prominent 
variants were carried through one round of RNA-catalyzed evo- 
lution, both separately and as an equimolar mixture of all six. 
Following HHR- synthesis, HHR+ synthesis, and selection for 
hammerhead activity, the population of RNAs were analyzed by 
deep sequencing. By comparing the sequence progression of each 
starting variant in isolation to that within the combined mixture, 
it was possible to determine the fraction of the progeny population 
that derives from each of the parental sequences. 

The enrichment of copy number of each variant relative to Seq0 
provides a measure of fitness that is very similar to that calculated 
by multiplying the relative yields of plus- and minus-strand syn- 
thesis and RNA-cleavage activity (Fig. 4B). Only Seq35 exhibits 
substantially greater fitness as estimated from replicability and 
catalytic activity compared to competitive enrichment. The most 
rapidly growing peak sequences in the later rounds of evolution, 
Seq3, Seq5, and Seq15, have the highest fitness by either measure, 
exceeding that of Seq0 by four- to ninefold. 

It is possible that certain hammerhead sequences might be 
prone to higher or lower fidelity of copying during RNA replica- 
tion or are more likely to give rise to disabling mutations. This 
possibility was investigated by examining the distribution of prog- 
eny sequences resulting from one round of replication and com- 
paring those sequences to the sequence requirements for a 
functional hammerhead (20, 29). The specific activity of each 
variant was estimated by multiplying its observed activity by the 
fraction of functional progeny. The relative fitness of each variant 
compared to Seq0, based on combined replication yield and esti- 
mated specific activity, more closely matches that based on 
observed enrichment over one round of evolution, notably cor- 
recting the overestimate of fitness for Seq35 (Fig. 4B). In general, 
however, the further inclusion of sequence-specific effects changes 
the estimates of relative fitness by an average of only~20% for the 
five peak sequences. 


Visualizing the RNA-Catalyzed Evolution of RNA. To better 
visualize the evolution of the two populations over time, the RNA 
sequences were mapped from the high-dimensional sequence 
space defined by the 27 variable nucleotide positions to a two- 
dimensional plane defined by Seq0 and reference sequences that 
emerged in the later rounds of each lineage. For the 52-2 lineage 
the reference was a highly divergent RNA that no longer conforms 
to the hammerhead motif; for the 71-89 lineage the references 
were Seq15 and Seq35, which were the most divergent of the 
peak sequences to emerge during the eight rounds of evolution. 
The density distribution of functional hammerheads within the 
two-dimensional plane can be represented as a contour map that 
encompasses all HHR+ sequences that had been observed in either 
lineage (Fig. 5). 

Visualizing the population of RNAs as a scatterplot on a 
two-dimensional plane provides a time-lapsed view of how the 
population in each lineage explores sequence space over the course 
of evolution (Movie S1). For the 52-2 lineage, the distribution of 
sequences quickly diverges from the domain of functional ham- 
merheads, with a progressively dwindling fraction of the RNAs 
clinging to the region of functionality (Fig. 5A). By round 8, 
nearly all of the sequences have drifted far from the region of 
functionality. In contrast, the 71-89 lineage proceeds through a 
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succession of sequence clusters, corresponding to a quasispecies 
distribution centered about Seq2, then Seq3 and Seq5, then 
Seq15, and finally Seq35 (Fig. 5B). Throughout the evolution of 
the 71-89 lineage, sequence divergence of the population increases 
over time, but occurs primarily along regions of high-level ham- 
merhead activity. Sequence diversity is shaped by selection, 
expanding into new regions of high functionality as the evolving 
population explores variants that are replicated efficiently while 
preserving hammerhead activity. 


Discussion 


‘The propagation of genetic information is critically dependent on 
the fidelity of the copying mechanism. For the replication of RNA, 
both the plus and minus strands must be copied with sufficient 
fidelity that the selective advantage of the fittest individuals 
exceeds the probability of producing an error copy of those indi- 
viduals. The earlier 52-2 RNA polymerase has a single-pass fidelity 
of 85.6% per nucleotide and fidelity for reciprocal synthesis of 
81.4% for the variant of the hammerhead ribozyme used in this 
study. Thus, the 52-2 polymerase fails to meet the fidelity criterion, 
even for this small functional RNA. However, a directed evolution 
campaign to develop higher-fidelity polymerases was successful, 
resulting in the 71-89 polymerase with a single-pass fidelity of 
90.9% and fidelity for reciprocal synthesis of 89.1% (SI Appendix, 
Tables S2 and S3). The improved polymerase has lower fidelity 
for the synthesis of HHR- compared to HHR+, but the fidelity 
of reciprocal synthesis is sufficient to enable Darwinian evolution 
of the hammerhead’s catalytic function. 

Whereas the hammerhead ribozymes in the 52-2 lineage 
quickly diverged toward random sequences, those in the 71-89 
lineage evolved a succession of variants with increased fitness com- 
pared to the starting Seq0 hammerhead. The initial variants that 
emerged contained only two or three mutations relative to Seq0, 
whereas the later variants contained five or six mutations, which 
would have been more difficult to access in the early rounds of 
evolution. Sequence clustering and phylogenetic analysis of the 
71-89 lineage revealed that the population came to be dominated 
by a succession of five distinct peak sequences and associated clus- 
ters of closely related sequences (Fig. 3A). Two sequence clusters 
emerged at the outset, centered about peak sequences Seq2 and 
Seq5, with two and three mutations, respectively (Fig. 3 B and 
C). The Seq2 cluster remained prevalent throughout the course 
of evolution and was soon accompanied by the Seq35 cluster, with 
five mutations, which slowly became more prevalent over succes- 
sive rounds. The Seq5 cluster also remained prevalent throughout 
the course of evolution, later accompanied by the Seq3 and Seq15 
clusters with five and six mutations, respectively, the latter of 
which only reached significant abundance in the last two rounds. 

All of the most abundant evolved variants contain mutations 
that alter stem II of the hammerhead ribozyme. This stem is 
required for function, but the sequence of the stem can vary so 
long as the stem structure is maintained (20). The polymerase 
ribozyme, despite its ability to operate on a broad range of 
sequences, struggles to copy highly structured templates and has 
reduced copying efficiency even for moderately structured tem- 
plates (17). Stem I within HHR+ RNA and its complement 
within HHR- RNA impose a modest obstacle to replication, 
making it selectively advantageous for variants to emerge that dest- 
abilize the stem and its complement, but not in a way that prevents 
the stem from forming. Each of the major evolved variants contains 
such mutations, destabilizing base pairs within stem II or altering 
the closing UNCG tetraloop due to nucleotide substitutions or 
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Fig.5. Scatterplots of the evolving populations of hammerhead ribozymes. For each round of evolution, catalyzed by either the (A) 52-2 or (B) 71-89 polymerase, 
100,000 sampled RNA sequences were mapped onto a two-dimensional plane defined by the first (PC1) and second (PC2) principal components of variation 
relative to SeqO and three other reference sequences. Even numbered rounds are shown here, with all eight rounds shown in Movie S1. The position of SeqO 
is indicated by a white circle and contour lines indicate diminishing levels of hammerhead functionality radiating out from SeqO, with the highest and lowest 
contours representing >90% and <10% functionality, respectively. Sequences present at >0.05% abundance are indicated by colored circles and individual 
sequences are shown as dots colored according to the corresponding cluster. Sequences that do not belong to any cluster are shown in gray. 


deletions. Most of the peak sequences also contain a single muta- 
tion within stem I that likely has similar benefit for RNA replica- 
tion. Only the 3’ portion of stem I is free to mutate because the 
5’ portion constitutes part of the attached RNA substrate (Fig. 1 
Cand D). 

The fitness of each peak sequence was determined experimen- 
tally, which revealed that each is copied more readily than Seq0 
for the synthesis of both HHR- and HHR+ RNA (Fig. 4A). 
However, each peak sequence has a reduced level of RNA-cleavage 
activity compared to Seq0. In all cases, the improved replicability 
substantially outweighs the reduced activity (Fig. 4B). Seq15 and 
Seq3, which were the two fastest growing sequence clusters over 
the course of evolution, have the highest and second highest 
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relative fitness, respectively, based on the combined parameters of 
replicability and specific catalytic activity. These variants rank in 
the same order when fitness is determined based on relative enrich- 
ment of progeny RNAs in a competition experiment beginning 
with an equimolar mixture of Seq0 and all five peak variants, As 
a consequence, the progeny of Seq15 and Seq3 each comprise 
~30% of the population following a single round of evolution, 
compared to only 4% for Seq0. 

Given the high mutation rates in both evolutionary lineages, 
the object of selection is not a single peak sequence, but rather a 
constellation of variants that include the peak sequence and closely 
related sequences that evolve together as a quasispecies (8, 9, 28). 
The structure of a quasispecies is determined both by the available 
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pathways for mutational interconversion among its members and 
by the underlying fitness distribution of those members. The ham- 
merhead ribozyme has been extensively studied, including with 
regard to the detailed sequence requirements for its catalytic activ- 
ity (20, 29). Thus one can predict with good accuracy the distri- 
bution of functional hammerheads in an evolving population. The 
population of RNAs can be represented as a two-dimensional 
scatterplot, superimposed on a contour map that denotes gradated 
levels of congruence with the functional hammerhead motif 
(Fig. 5 and Movie S1). 

Over successive rounds of evolution, new quasispecies emerged 
as clouds of sequences centered on a small number of 
high-abundance sequences. Due to the poor fidelity of the 52-2 
polymerase, the structure of a quasispecies cannot be maintained 
around the starting sequence, nor established around any new 
peak sequence that starts to emerge in that lineage. Instead, the 
population drifts toward random sequences, forsaking the domain 
of hammerhead functionality. In contrast, the higher-fidelity 
71-89 polymerase enables an evolutionary progression of succes- 
sive peak sequences and corresponding quasispecies that move 
progressively further from the starting sequence, while remaining 
within the central portion of the contour map for hammerhead 
functionality. 

Even though the 52-2 lineage lost hammerhead functionality, 
it could still be propagated (Fig. 2A). This behavior partly reflects 
the small but dwindling portion of the population that retain 
catalytic activity, but is also due to imperfect stringency of selec- 
tion. Catalytically inactive RNAs can still be released from the 
streptavidin-coated beads due to uncatalyzed cleavage of the RNA 
substrate during the 30-min incubation in the presence of 20 mM 
MgCl,. In addition, there likely is imperfect exclusion of a small 
amount of uncleaved materials. These mechanisms can be signif- 
icant relative to the small amount of functional hammerheads 
produced by the low-fidelity 52-2 polymerase but have little con- 
sequence compared to the strong signal of hammerhead-catalyzed 
cleavage that is maintained throughout the 71-89 lineage (Fig. 2B). 

The 71-89 polymerase contains 10 mutations relative to the 
52-2 polymerase (Fig. 1B), which result in an increase in replica- 
tion fidelity from 81.4% to 89.1% per nucleotide for a complete 
cycle of HHR- synthesis followed by HHR+ synthesis (S7 Appendix, 
Table S3). Much of the improved fidelity can be attributed to the 
ability of the 71-89 polymerase to operate in the presence of 50 
mM Mg”, whereas the 52-2 polymerase requires 200 mM Mg” 
to replicate RNAs as complex as the hammerhead (17). A complete 
cycle of replication requires reciprocal synthesis of 27 nucleotides, 
which corresponds to a maximum permissible error rate of ~'/,, 
= 3.7% per nucleotide. The 71-89 polymerase, with a fidelity of 
89.1% for reciprocal synthesis of HHR- and HHR+, would seem 
to exceed this error rate. However, experimentally determined 
fitness landscapes for other RNA-cleaving ribozymes suggest that 
the maximum permissible error rate can be at least sevenfold higher 
than simply the inverse of sequence length (30). This observation 
derives in part from a high-stringency selection process that enables 
superior sequences to prevail against their competitors, thereby 
easing the error threshold by a factor of the natural log of the 
superiority of the most advantageous variants (8). In addition, the 
cooperative nature of RNA folding ensures that many mutations 
within the catalytic motif have little or no deleterious effect. The 
27-nucleotide region of the hammerhead that was subject to recip- 
rocal copying contains only 10 nucleotides that are completely 
intolerant of mutation (20, 29). 

The 71-89 polymerase contains 182 nucleotides (Fig. 1B), 
which would correspond to a maximum permissible error rate of 
~'/,g) = 0.5% per nucleotide for a complete cycle of plus- and 
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minus-strand synthesis. However, as with the hammerhead, the 
error threshold would be much more permissive if selection strin- 
gency is high and the cooperative folded structure of the polymer- 
ase renders it tolerant to mutation at a significant fraction of the 
nucleotide positions. Under suitable selection conditions, a rep- 
lication fidelity of 97 to 98% would likely be sufficient to enable 
the RNA-catalyzed evolution of the polymerase ribozyme itself. 
Although it is difficult to predict the future path of evolution, 
both in nature and in directed evolution experiments, such an 
increase in fidelity does not seem unattainable. 


Materials and Methods 


Materials. The sequences of all oligonucleotides used in this study are listed 
in SI Appendix, Table S1. Synthetic oligonucleotides were either purchased from 
IDT or prepared by solid-phase synthesis using an Expedite 8909 DNA/RNA 
synthesizer, with reagents and phosphoramidites from either Glen Research or 
ChemGenes. All other RNAs, including RNA templates and polymerase ribozymes, 
were prepared by in vitro transcription (SI Appendix, Methods) and purified by 
denaturing polyacrylamide gel electrophoresis (PAGE). Assays of polymerase 
or hammerhead ribozyme activity used fluorescently labeled RNA primers or 
substrates, respectively. The reaction products were separated by PAGE, imaged 
using an Amersham Typhoon RGB laser scanner, quantified using ImageQuant 
TL software, and plotted using GraphPad Prism. Hot Start Taq, OneTaq, Q5 high- 
fidelity DNA polymerase, T4 RNA ligase, T4 RNA ligase 2, K2270T4 RNA ligase 2, 
and Universal miRNA Cloning Linker were from New England Biolabs. SuperScript 
lV reverse transcriptase, Turbo DNase, MyOne C1 streptavidin magnetic beads, 
and Pierce high-capacity streptavidin agarose beads were from Thermo Fisher 
Scientific. Nucleoside 5’-triphosphates were from Chem-Impex International, 
pCp-biotin, y-(2-azidoethyl)-ATP, and sulfo-cyanine5-azide were from Jena 
Bioscience, and all other chemical reagents were from Sigma-Aldrich. 


Directed Evolution of Polymerase Ribozymes. Beginning with a population 
of polymerase ribozymes obtained after 52 rounds of directed evolution (17), 
error-prone PCR was performed (31) and a new 5’-terminal region was added 
during PCR using primer Fwd1, which contained an Avall restriction site. In vitro 
transcription of polymerase-encoding DNA was carried out in the presence of 
y-(2-azidoethyl)-ATP, which enabled attachment via click chemistry (S/Appendix, 
Methods) of a 5’-hexynylated RNA primer (Primer). Following PAGE purification, 
50 nM polymerase-primer conjugates were mixed with 100 nM 3’-biotinylated 
RNA template (Tem2) and 75 nM cofactor oligodeoxynucleotide (Tem4), which 
were annealed by heating at 80 °C for 30 s, then cooling to 17 °C. The cofactor 
oligo, which binds to the new 5’ end of the polymerase, was found to increase 
polymerase activity and was included in all polymerase-catalyzed reactions. 

The annealed materials were added to a reaction mixture containing 4 mM of 
each NTP, various concentrations of MgCl,, 50 mM Tris-HCl (pH 8.3), and 0.05% 
Tween-20, which were incubated at 25 °C for various times. The reaction was 
quenched by adding an equal volume of 250 mM EDTA, 500 mM NaCl, 5 mM 
Tris-HCI (pH 8.0), and 0.025% Tween-20, then mixed with 5 ug streptavidin 
magnetic beads per pmol of biotinylated template and incubated with agitation 
at 23 °C for 1 h. The beads had been pre-blocked by incubating with 1 mg/mL 
tRNA. The beads were washed twice with urea buffer [8 M urea, 1 mM EDTA, and 
10 mM Tris-HCl (pH 8.0)], and the polymerase-primer conjugates were eluted 
from the RNA template in NaOH buffer (25 mM NaOH, 1 mM EDTA, and 0.05% 
Tween-20), then neutralized in 100 mM Tris-HCl (pH 7.5), ethanol precipitated, 
and the full-length products were purified by PAGE. 
The purified products were captured on magnetic beads that had been derivat- 
ized with DNA oligo Rev2, which is complementary to the 3’ end of the polymer- 
ase (SI Appendix, Methods). The bead-bound polymerase was reverse transcribed 
in situ. Then in a concerted reaction using both Avall restriction enzyme and T4 
DNA ligase, the 5’ substrate for the class | ligase (S1) was installed at the 3’ end 
of the polymerase cDNA. The 3’ substrate ($2), which was 3’-biotinylated, was 
added to the solution and the bead-bound ligase ribozymes were challenged to 
join the two substrates in the presence of 60 mM MgCl, 200 mM KCI, 0.6 mM 
EDTA, and 50 mM Tris-HCI (pH 8.3) at 25 °C in 30 min. 

The ligated products and corresponding polymerase cDNAs were cleaved off 
the beads using EcoRV (SI Appendix, Methods). The released molecules were 
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captured on streptavidin magnetic beads, washed three times with NaOH buffer 
and twice with urea buffer, eluted in formamide buffer [95% formamide, 100 nM 
EDTA, and 1 mM Tris-HCl (pH 8.0)] by heating at 95 °C for 5 min, then ethanol 
precipitated. The cDNAs were reverse transcribed using primer Fwd2, which is 
specific for substrate S2, then PCR amplified using primers Rev1 and Fwd3, the 
latter of which is specific for the ligation junction between S1 and S2. Nested 
PCR with primers Fwd4 and Rev1 provided materials to begin the next round of 
evolution. New mutations were introduced by error-prone PCR (31) after rounds 
53 through 58, 60, 61, 63, and 67 through 70. 


Directed Evolution of Hammerhead Ribozymes. Evolution was initiated 
with the Seqd HHR+ template RNA (HHR-+(0)), which was prepared by in vitro 
transcription (S! Appendix, Methods). During each round of evolution, 100 nM 
HHR+ template RNA was mixed with 100 nM polymerase ribozyme, 100 nM 
RNA Primer2 containing a photocleavable biotin moiety, and 200 nM cofactor 
oligo Tem4 when using the 71-89 polymerase. The RNAs were annealed, then 
incubated for 24 h under standard polymerase conditions, either with 200 mM 
MgCl, for the 52-2 lineage or with 50 mM MgCl, and 8% PEG8000 for the 71-89 
lineage. The reaction was quenched with EDTA and the products were incubated 
with 10 uL streptavidin agarose beads per 200 pmol of RNA primer, which had 
been pre-blocked with tRNA. The HHR- RNA was washed three times with NaOH 
buffer and twice with urea buffer, the beads were again blocked with 1 mg/mL 
tRNA, and the extension products were eluted by exposure to 350 nm UV irradi- 
ation in the presence of 25 mM NaCl, 1 mM EDTA, and 10 mM Tris-HCl (pH 8.0) 
at 23 °C for 30 min. The full-length products were purified by PAGE. 

One hundred nanomolar purified HHR- RNA was mixed with 100 nM RNA pol- 
ymerase ribozyme, 80 nM RNA Primer3 containing a biotinylated hammerhead 
substrate RNA atits 5’ end, and 1 uM blocking oligodeoxynucleotide Tem9, which 
masked the hammerhead substrate during the polymerization reaction. The RNAs 
were annealed and reacted as described above for HHR- synthesis. The reactions 
were quenched with EDTA and the products were captured on streptavidin mag- 
netic beads that had been pre-blocked with tRNA, then washed three times with 
NaOH buffer and twice with urea buffer. The washed beads were suspended in 
20 mM MgCl,, 50 mM Tris-HCl (pH 8.0), and 0.05% Tween-20 and incubated at 
25 °C for 30 min to elute functional hammerhead ribozymes. The supernatant 
was passed through a 0.2-41m filter to remove residual magnetic beads, and the 
RNAs were reverse transcribed and PCR amplified. New copies of HHR+ RNAwere 
transcribed from the amplified PCR products. A mock round 8 of evolution was 
performed using the HHR+ RNA from round 7 as input and carrying out reverse 
transcription and PCR amplification, but omitting the steps of RNA-catalyzed 
amplification and hammerhead cleavage. 


Biochemical Characterization of Evolved Hammerhead Ribozymes. The 
catalytic activity of the population of hammerhead ribozymes obtained after each 
round of evolution was assayed using 0.2 uM HHR+ RNA prepared by in vitro 
transcription and 0.24 uM of a separate 5’-fluorescently labeled RNA substrate 
(S3), which were incubated under the same conditions and for the same amount 
of time as used to select functional hammerheads. The products were separated 
by PAGE to determine the yield of cleaved products. 

The activity of individual hammerhead ribozymes was determined by their 
ability to cleave an attached RNA substrate under the same conditions as during 
RNA-catalyzed evolution. The hammerhead-substrate constructs were prepared 
by chemical synthesis, incorporating a Cy5 label (S/ Appendix, Methods). The 
constructs were captured on streptavidin magnetic beads, washed, and allowed 
to undergo substrate cleavage for 30 min. The cleaved RNA was collected in the 
supernatant, then the uncleaved RNA was eluted separately using formamide, 
and both were analyzed by PAGE to determine the yield of cleaved products. 

HHR+ RNAs that had been prepared by in vitro transcription were used as 
templates for the polymerase-catalyzed synthesis of HHR- by extension of RNA 
Primer2 under the same conditions as during RNA-catalyzed evolution. HHR- 
RNAs that were prepared using the 71-89 polymerase were similarly used as 
templates for the polymerase-catalyzed synthesis of HHR+ by extension of RNA 
Primer4. The extension products of both reactions were analyzed by PAGE to 
determine the yield of full-length products. 


Sequencing of Evolved Hammerhead Ribozymes. The PCR products from 
each round of RNA-catalyzed evolution were subject to deep sequencing, together 
with corresponding HHR- RNAs and HHR+ RNAs that had not undergone 
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hammerhead cleavage and had been reverse transcribed and PCR amplified. 
All three sets of cDNAs were prepared for sequencing by nested PCR amplifica- 
tion and were sequenced by the Salk Next Generation Sequencing Core on an 
Illumina NextSeq2000 with a 300-cycle paired-end run. The sequence reads were 
trimmed, filtered, and merged using a previously described analytical pipeline 
(17) (SI Appendix, Methods). 

Sequences of HHR- and pre-cleaved HHR+ RNAs from the first round of evo- 
lution were processed separately to determine the fidelity of a half- and full-cycle 
of RNA-catalyzed replication, respectively. The sequences were aligned to that of 
Seq0 using bowtie2 v2.4.2 (32), and the frequency of substitutions, insertions, 
and deletions was determined for each of the 27 nucleotide positions that were 
free to vary (Fig. 1C). 

Sequences of pre-cleaved and cleaved HHR+ RNAs were processed using a 
custom Python script to determine the frequency of each distinct sequence in 
each round of evolution (S/ Appendix, Methods). Sequences that included the 
strictly conserved hammerhead nucleotides 5’-CUGANGA... GAAA-3’, together 
with a Watson-Crick base pair at the base of stem | and either an R:Y or Y:Y pair 
at the base of stem Il, were identified as matching the biochemically defined 
hammerhead motif (20, 29). A peak-finding algorithm (26) identified local peak 
sequences in each population (above a minimum threshold of 0.1%) and joined 
sequences within two mutations of the most abundant nearby peak in a greedy 
fashion. For each sequence, the Levenshtein distance was calculated to Seq0 and 
three reference sequences. 

Statistical properties of the sequences in the evolving populations 
were determined using custom scripts in R. The average distance between 
sequences in each population of cleaved HHR+ RNAs was determined by 
randomly sampling 100,000 pairs of sequences from the list of distinct 
sequences, based on the frequency of the sequence in a given round, and 
averaging the Levenshtein distance between each pair divided by the number 
of variable nucleotides in the first member of the pair. The normalized Shannon 
population entropy, which was also determined from a sample of 100,000 
sequences, is defined as the sum over all distinct sequences: > F, x In(F,)/ 
In(1/N), where F, is the frequency of each distinct sequence in the sample 
and N is the total number of sequences in the population. Sub-sampling of 
sequences was carried out to ensure that entropy values were determined at 
the same sequencing depth (33). 

Relative changes in frequency between round 7 and either round 8 ora mock 
round 8 were calculated for each sequence that was present in round 7 at >0.01% 
frequency, corresponding to a sequencing depth of at least 50 reads. Phylogenetic 
trees rooted to SeqO were generated using the neighbor-joining algorithm of 
Saitou and Nei (27), encompassing all sequences that reached a maximum fre- 
quency >0.1% for the 52-2 lineage and >0.5% for the 71-89 lineage during 
rounds 3 to 8 of evolution. 


Yield of RNA Progeny After a Single Round of Evolution. A single mock 
round of hammerhead evolution was performed using the 71-89 polymerase 
and the Seq0, Seq2, Seq3, Seq5, Seq15, and Seq 35 HHR+ template RNAs, both 
individually and as an equimolar mixture. The resulting HHR+ RNAs were reverse 
transcribed, PCR amplified, and sequenced. The frequency of cleaved HHR+ RNA 
sequences in the mixed population were fit to a multiple linear regression of the 
frequencies of sequences in each population that were propagated individually. 
Regression coefficients for the individually propagated populations were used 
to assign the fractional contribution of corresponding starting RNAs to progeny 
RNAs in the mixed population. The predicted abundance of progeny for a given 
starting RNA was calculated by multiplying the frequency of each RNA in the 
mixed starting population by a relative fitness value that was determined bio- 
chemically (Fig. 4). The predicted fraction of functional hammerheads among 
the replication products was estimated by the fraction of pre-cleaved HHR+ RNA 
sequences from each individually propagated population that are consistent with 
the hammerhead motif. 


Scatterplot Analysis of Evolving Population. A two-dimensional map of 
the evolving populations in sequence space was constructed based on the 
Levenshtein distance from each distinct sequence in the population to four ref- 
erence sequences. The first two principal components of variation between refer- 
ence sequences reflect: for PC1, proximity to functional hammerhead sequences 
versus sequences that have diverged from the hammerhead motif; and for PC2, 
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proximity to Seq15 versus Seq35 (SI Appendix, Methods). Based on their dis- 
tances to the four reference sequences, the positions of all distinct cleaved HHR+ 
Sequences were projected onto the plane defined by PC1 and PC2 and plotted 
using a custom script in R (SI Appendix, Methods). The density of hammerhead 
functionality across the two-dimensional plane was estimated as the local average 
fraction of all sequences in the cleaved HHR+ RNA populations that match the 
functional hammerhead motif. 
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